-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store everest results in ERT storage #9161
Store everest results in ERT storage #9161
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9161 +/- ##
==========================================
+ Coverage 91.67% 91.84% +0.16%
==========================================
Files 424 424
Lines 26511 26748 +237
==========================================
+ Hits 24305 24566 +261
+ Misses 2206 2182 -24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
7081e58
to
32193bd
Compare
5b54ee7
to
dda3db9
Compare
src/everest/everest_storage.py
Outdated
|
||
mapping = {} | ||
for d in dummy_df.select("realization", "simulation_id").to_dicts(): | ||
# Currently we work with str, but should maybe not be done in future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to keep this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/everest/everest_storage.py
Outdated
def _store_gradient_results(self, results: FunctionResults) -> _GradientResults: | ||
perturbation_objectives = polars.from_pandas( | ||
results.to_dataframe("evaluations").reset_index() | ||
).drop("plan_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no drop if possible.
src/everest/everest_storage.py
Outdated
# expected to be None? | ||
batch_objective_gradient = polars.from_pandas( | ||
results.to_dataframe("gradients").reset_index() | ||
).drop("plan_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no drop if possible.
005c29e
to
54c9c5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ac09579
to
af4a12a
Compare
src/everest/everest_storage.py
Outdated
def _enforce_dtypes(df: polars.DataFrame) -> polars.DataFrame: | ||
dtypes = { | ||
"batch_id": polars.UInt32, | ||
"result_id": polars.UInt32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you drop all instances of storing result_id
? I do not think it is needed anywhere and I am considering changing/dropping it in ropt
, so it would be better not to depend on it.
35d6861
to
1bd76d6
Compare
1bd76d6
to
6bef204
Compare
Issue
Resolves #8811
Base idea/documentation:
Store datasets by
[batch, realization, perturbation] x [controls, objectives, constraints, objective_gradient, constraint_gradient]
:Exhaustive list of data stored PER BATCH :
batch.json
- contains info about the batch, batch_id and whether it is an improvement (aka merit flag, but the concepts are now unified for dakota and non-dakota runs)batch_constraints
constraint values (and violations) for constraints, batch-widebatch_objectives
objective values, batch-widerealization_controls
- control values for geo-realizations, also includessimulation_id
realization_objectives
- objective values per geo-realizationrealization_constraints
- constraint values per geo-realizationperturbation_objectives
- objective and control values per perturbationperturbation_constraints
- constraint and control values per perturbation (Note/discussion point: control values could be pulled into separate table to avoid redundancy)batch_objective_gradient
- Partial derivatives of objectives, given different controls. This dataset has one column per objective, and one row per control value, and the intersecting cells represent the partial derivative of the objective wrt that control value.batch_constraint_gradient
- Partial derivatives of constraints, given different controls. This dataset has one column per constraint, and one row per control value, and the intersecting cells represent the partial derivative of the constraint wrt that control value.Example data from
math_func/config_advanced.yml
(json format)Exhaustive list of data stored PER OPTIMIZATION
controls.json
- control values for this batchrealization_weights.json
- realization weightsnonlinear_constraints
- conditions for constraints to satisfy (on average over the batch)objective_functions
- objective function names, weights, and normalizationExample data from
math_func/config_advanced.yml
Potential simplifications
The
everest_data_api
is currently used for plotting, but could be used (probably expanded a bit) to avoid doing direct (polars) dataframe manipulations elsewhere in the code, but currently they are done directly in the code.